• Home
  • Introduction
  • Data Source
  • Data Visualization
  • Exploratory Data Analysis
  • ARMA/ARIMA/SARIMA Model
  • ARIMAX Model
  • Financial Time Series Model
  • Deep Learning for TS
  • Conclusion

EDA for Consumer Discretionary Sector Fund

The Consumer Discretionary Sector Fund (XLY) is an exchange-traded fund (ETF) that aims to track the performance of companies within the consumer discretionary sector. This sector includes businesses that offer non-essential goods and services such as apparel, leisure, media, and retail. The XLY fund invests in companies such as Amazon, Walt Disney, Nike, and Home Depot, among others. The consumer discretionary sector is known for being highly sensitive to economic conditions and consumer sentiment, and as such, it tends to be more volatile than other sectors. This makes the XLY fund a popular choice among investors looking to take on higher levels of risk in pursuit of potentially higher returns.

Time Series Plot
Code
# get data
options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)


data = getSymbols("XLY",src='yahoo', from = '2010-01-01',to = "2023-03-01")

df <- data.frame(Date=index(XLY),coredata(XLY))

# create Bollinger Bands
bbands <- BBands(XLY[,c("XLY.High","XLY.Low","XLY.Close")])

# join and subset data
df <- subset(cbind(df, data.frame(bbands[,1:3])), Date >= "2010-01-01")

#export the data 
XLY_data <- df
write.csv(XLY_data, "DATA/CLEANED DATA/XLY_raw_data.csv", row.names=FALSE)

# colors column for increasing and decreasing
for (i in 1:length(df[,1])) {
  if (df$XLY.Close[i] >= df$XLY.Open[i]) {
      df$direction[i] = 'Increasing'
  } else {
      df$direction[i] = 'Decreasing'
  }
}

i <- list(line = list(color = '#8B3A3A'))
d <- list(line = list(color = '#7F7F7F'))

# plot candlestick chart

fig <- df %>% plot_ly(x = ~Date, type="candlestick",
          open = ~XLY.Open, close = ~XLY.Close,
          high = ~XLY.High, low = ~XLY.Low, name = "XLY",
          increasing = i, decreasing = d) 
fig <- fig %>% add_lines(x = ~Date, y = ~up , name = "B Bands",
            line = list(color = '#ccc', width = 0.5),
            legendgroup = "Bollinger Bands",
            hoverinfo = "none", inherit = F) 
fig <- fig %>% add_lines(x = ~Date, y = ~dn, name = "B Bands",
            line = list(color = '#ccc', width = 0.5),
            legendgroup = "Bollinger Bands", inherit = F,
            showlegend = FALSE, hoverinfo = "none") 
fig <- fig %>% add_lines(x = ~Date, y = ~mavg, name = "Mv Avg",
            line = list(color = '#E377C2', width = 0.5),
            hoverinfo = "none", inherit = F) 
fig <- fig %>% layout(yaxis = list(title = "Price"))

# plot volume bar chart
fig2 <- df 
fig2 <- fig2 %>% plot_ly(x=~Date, y=~XLY.Volume, type='bar', name = "XLY Volume",
          color = ~direction, colors = c('#8B3A3A','#7F7F7F')) 
fig2 <- fig2 %>% layout(yaxis = list(title = "Volume"))

# create rangeselector buttons
rs <- list(visible = TRUE, x = 0.5, y = -0.055,
           xanchor = 'center', yref = 'paper',
           font = list(size = 9),
           buttons = list(
             list(count=1,
                  label='RESET',
                  step='all'),
             list(count=3,
                  label='3 YR',
                  step='year',
                  stepmode='backward'),
             list(count=1,
                  label='1 YR',
                  step='year',
                  stepmode='backward'),
             list(count=1,
                  label='1 MO',
                  step='month',
                  stepmode='backward')
           ))

# subplot with shared x axis
fig <- subplot(fig, fig2, heights = c(0.7,0.2), nrows=2,
             shareX = TRUE, titleY = TRUE)
fig <- fig %>% layout(title = paste("Consumer Discretionary Sector Fund Stock Price: JAN 2010 - March 2023"),
         xaxis = list(rangeselector = rs),
         legend = list(orientation = 'h', x = 0.5, y = 1,
                       xanchor = 'center', yref = 'paper',
                       font = list(size = 10),
                       bgcolor = 'transparent'))

fig

The Consumer Discretionary Sector Fund (XLY) has shown an overall upward trend from 2010 to March 2023, with some fluctuations along the way. One significant fluctuation occurred during the COVID-19 pandemic in early 2020, where the XLY experienced a significant decline as people spent less on discretionary items due to economic uncertainty and lockdowns. However, it has since recovered and reached new highs.

One reason for the overall upward trend could be attributed to the increasing consumer spending on discretionary items over the years, boosted by a growing economy and increasing disposable income. Additionally, the rise of e-commerce and online shopping has also contributed to the sector’s growth, with many consumers opting for the convenience and accessibility of shopping online. However, competition within the sector and changing consumer trends can also affect the fund’s fluctuations.

For stock prices, a multiplicative decomposition is typically preferred because the percentage changes in stock prices tend to be more important than the absolute changes. Additionally, stock prices tend to exhibit non-constant variance, meaning that the variance of the series changes over time. A multiplicative decomposition can handle this non-constant variance more effectively than an additive decomposition.

Decomposed Time Series

  • Decomposition Plot
  • Adjusted Decomposition Plot
Code
#time series data
myts<-ts(df$XLY.Adjusted,frequency=252,start=c(2010,01,01), end = c(2023,3,1)) 
#original plot for time series data
orginial_plot <- autoplot(myts,xlab ="Year", ylab = "Adjusted Closing Price", main = "Consumer Discretionary Sector Fund Stock price: JAN 2010 - March 2023")
#decompose the data
decompose = decompose(myts, "multiplicative")
#decomposition plot
autoplot(decompose)

Code
#adjusted plot
trendadj <- myts/decompose$trend
decompose_adjtrend_plot <- autoplot(trendadj,ylab='trend') +ggtitle('Adjusted trend component in the multiplicative time series model')
seasonaladj <- myts/decompose$seasonal
decompose_adjseasonal_plot <- autoplot(seasonaladj,ylab='seasonal') +ggtitle('Adjusted seasonal component in the multiplicative time series model')
grid.arrange(orginial_plot, decompose_adjtrend_plot,decompose_adjseasonal_plot, nrow=3)

The adjusted seasonal component tend to have upward trend till 2019 and drops during the covid period and there is more variability in the model when compared to the original plot where the variation during the years but the adjusted trend then to have more fluctuation showing no trend when compared to the original plot.

Lag Plots

  • Daily Time Lags
  • Monthly Time Lags
Code
#Lag plots 
gglagplot(myts, do.lines=FALSE, lags=1)+xlab("Lag 1")+ylab("Yi")+ggtitle("Lag Plot for Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023")

Code
#montly data
mean_data <- df %>% 
  mutate(month = month(Date), year = year(Date)) %>% 
  group_by(year, month) %>% 
  summarize(mean_value = mean(XLY.Adjusted))
month<-ts(mean_data$mean_value,start = c(2010, 1),frequency = 12)
#Lag plot
ts_lags(month)

The first lag plot shows the daily time lags of the Consumer Discretionary Sector Fund stock price from JAN 2010 to March 2023. The plot indicates that there is a strong positive correlation between the current value and the previous day’s value, as seen by the points clustering along the diagonal line. This suggests that the stock price has a positive autocorrelation at a lag of one day.

The second lag plot shows the monthly time lags of the mean value of the Consumer Discretionary Sector Fund stock price from JAN 2010 to March 2023. The plot indicates that there is a positive correlation between the current value and the value from the previous month. This suggests that the mean value of the stock price has a positive autocorrelation at a lag of one month.

Overall, the lag plots indicate that there is a positive autocorrelation present in the Consumer Discretionary Sector Fund stock price data, with the strongest correlation observed in the daily time series.

Seasonality

  • Seasonal Heatmap
  • Seasonal Line plot
Code
# Create seasonal plot
ts_heatmap(month, color = "PuBu", title = 'Seasonality Heatmap of Consumer Discretionary Sector Fund Stock Jan 2010 - March 2023')
Code
# Create a line graph for each year with months on the x-axis
ggseasonplot(month, datecol = "date", valuecol = "value")+ggtitle("Seasonal Yearly Plot for Consumer Discretionary Sector Fund Stock Jan 2010 - March 2023")

The Seasonality Heatmap for the Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023 does not reveal any clear seasonality in the data. The heatmap shows the mean value of the time series for each month and year combination, with the darker colors indicating higher values. The lack of clear patterns or darker colors in specific months or years suggests that there is no consistent seasonal pattern in the data. However, the yearly line graph shows a slight upward trend in the stock price from 2010 to 2023, but does not show any clear seasonality. Each year’s data is represented by a line, and the months are plotted on the x-axis. Overall, the lack of clear seasonality in both the heatmap and yearly line graph suggests that other factors beyond seasonality are driving the stock price fluctuations.

Moving Average

  • 4 Month MA
  • 1 Year MA
  • 3 Year MA
  • 5 Year MA
Code
#SMA Smoothing 
ma <- autoplot(month, series="Data") +
  autolayer(ma(month,5), series="4 Month MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023(4 Month Moving Average)") +
  scale_colour_manual(values=c("Data"="grey50","4 Month MA"="red"),
                      breaks=c("Data","4 Month MA"))
ma

Code
#SMA Smoothing 
ma <- autoplot(month, series="Data") +
  autolayer(ma(month,13), series="1 Year MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023(1 Year Moving Average)") +
  scale_colour_manual(values=c("Data"="grey50","1 Year MA"="red"),
                      breaks=c("Data","1 Year MA"))
ma

Code
#SMA Smoothing 
ma <- autoplot(month, series="Data") +
  autolayer(ma(month,37), series="3 Year MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023(3 Year Moving Average)") +
  scale_colour_manual(values=c("Data"="grey50","3 Year MA"="red"),
                      breaks=c("Data","3 Year MA"))
ma

Code
#SMA Smoothing 
ma <- autoplot(month, series="Data") +
  autolayer(ma(month,61), series="5 Year MA") +
  xlab("Year") + ylab("GWh") +
  ggtitle("Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023(5 Year Moving Average)") +
  scale_colour_manual(values=c("Data"="grey50","5 Year MA"="red"),
                      breaks=c("Data","5 Year MA"))
ma

The four plots show the Consumer Discretionary Sector Fund stock prices from JAN 2010 to March 2023, along with the moving averages for 4 months, 1 year 3 years and 4 years. As the window of the moving average increases, the smoother the trend line becomes, reducing the impact of noise and fluctuations in the original time series.

The 4-month moving average plot shows frequent fluctuations in the stock price, with the trend line following the general direction of the time series. The 1-year moving average plot shows a smoother trend, following the overall upward trend of the stock price.

The 1-year moving average plot shows a similar trend to the 4-month plot but is even smoother, with fewer fluctuations. Finally, the 5-year moving average plot shows the smoothest trend, with an almost constant upward slope.As the moving average window increases, the smoother trend allows for a clearer identification of the general trend of the Consumer Discretionary Sector Fund stock prices over time. From the moving average obtained above we can see that there is upward tend in the stock price of Consumer Discretionary Sector Fund.

Autocorrelation Time Series

  • ACF
  • PACF
  • ADF Test
Code
#ACF for  data
ggAcf(month)+ggtitle("ACF Plot for Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023")

Code
#PACF for data
ggPacf(month)+ggtitle("PACF Plot for Consumer Discretionary Sector Fund Stock JAN 2010 - March 2023")

Code
#check the stationarity
tseries::adf.test(month)

    Augmented Dickey-Fuller Test

data:  month
Dickey-Fuller = -2.716, Lag order = 5, p-value = 0.2778
alternative hypothesis: stationary

In the plot of autocorrelation function, which is the acf graph for monthly data, there are clear autocorrelation in lag. The above lag plots and autocorrelation plot indicates seasonality in the series, which means the series is not stationary. This can be verified by the Augmented Dickey-Fuller Test which tells us that as the p value is greater than 0.05.

Detrend and Differenced Time Series

  • Linear Fitting Model
  • ACF Plot
Code
fit = lm(myts~time(myts), na.action=NULL) 
summary(fit) 

Call:
lm(formula = myts ~ time(myts), na.action = NULL)

Residuals:
    Min      1Q  Median      3Q     Max 
-44.408  -9.726  -2.247   6.546  59.145 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.359e+04  1.343e+02  -175.7   <2e-16 ***
time(myts)   1.174e+01  6.661e-02   176.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.33 on 3277 degrees of freedom
Multiple R-squared:  0.9046,    Adjusted R-squared:  0.9046 
F-statistic: 3.108e+04 on 1 and 3277 DF,  p-value: < 2.2e-16
Code
# plot ACFs
plot1 <- ggAcf(myts, 48, main="Original Data: Consumer Discretionary Sector Fund Stock Stock Price")
plot2 <- ggAcf(resid(fit), 48, main="Detrended data")
plot3 <- ggAcf(diff(myts), 48, main="First differenced data")
grid.arrange(plot1, plot2, plot3, nrow=3)

The estimated slope coefficient β1, 2.0846 With a standard error of 0.1679, yielding a significant estimated increase of stock price is very less yearly. Equation of the fit for stationary process: \[\hat{y}_{t} = x_{t}+(4129.4701)-(2.0846)t\]

From the above graph we can say that there is no change in detrended plot and the original data acf plot, it typically means that the data is stationary. But when the first order difference is applied the high correlation is removed but there is no seasonal correlation.

As depicted in the above figure, the series is now stationary and ready for future study.